The three main objectives of this session are
What the geographic level in the UK? How to access and plot these?
What is spatial autocorrelation? How do we describe it?
What is a gravity model? What data do we need to implement the simplest one?
Worth noting about the Statistical Building Blocks is that they are derived from populations counts, not areas. Below is an overview of the thresholds used to create these geographies.
More about these population-weighted geographies here
You can get geographic data for the UK from the open geography portal via an API call.
library(raster)
library(knitr)
library(geojsonio)
library(sp)
library(tmap)
library(spdep)
library(reshape2)
library(rsq)
#connect to the open geography portal API
regions_json <- geojson_read("https://opendata.arcgis.com/datasets/8d3a9e6e7bd445e2bdcc26cdf007eac7_1.geojson", what = "sp")
z_regions_json <- regions_json
plot(regions_json )
I’m using table WU02EW - Location of usual residence and place of work by age.
#Connect to the NOMIS API to get Data
#Note: I'm only using England here for convience.
t_wu02Ew <- read.csv(file = "https://www.nomisweb.co.uk/api/v01/dataset/NM_1206_1.data.csv?date=latest&usual_residence=2013265921...2013265930&place_of_work=2013265921...2013265930&age=0...6&measures=20100", header=TRUE)
kable(head(t_wu02Ew))
| DATE | DATE_NAME | DATE_CODE | DATE_TYPE | DATE_TYPECODE | DATE_SORTORDER | USUAL_RESIDENCE | USUAL_RESIDENCE_NAME | USUAL_RESIDENCE_CODE | USUAL_RESIDENCE_TYPE | USUAL_RESIDENCE_TYPECODE | USUAL_RESIDENCE_SORTORDER | PLACE_OF_WORK | PLACE_OF_WORK_NAME | PLACE_OF_WORK_CODE | PLACE_OF_WORK_TYPE | PLACE_OF_WORK_TYPECODE | PLACE_OF_WORK_SORTORDER | AGE | AGE_NAME | AGE_CODE | AGE_TYPE | AGE_TYPECODE | AGE_SORTORDER | MEASURES | MEASURES_NAME | OBS_VALUE | OBS_STATUS | OBS_STATUS_NAME | OBS_CONF | OBS_CONF_NAME | URN | RECORD_OFFSET | RECORD_COUNT |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2011 | 2011 | 2011 | date | 0 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 0 | Aged 16 and over | 0 | Age | 1000 | 0 | 20100 | Value | 936525 | A | Normal Value | FALSE | Free (free for publication) | Nm-1206d1d32176e1d2013265921d2013265921d0d20100 | 0 | 700 |
| 2011 | 2011 | 2011 | date | 0 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 1 | Aged 16-24 | 1 | Age | 1000 | 1 | 20100 | Value | 130827 | A | Normal Value | FALSE | Free (free for publication) | Nm-1206d1d32176e1d2013265921d2013265921d1d20100 | 1 | 700 |
| 2011 | 2011 | 2011 | date | 0 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 2 | Aged 25-34 | 2 | Age | 1000 | 2 | 20100 | Value | 199584 | A | Normal Value | FALSE | Free (free for publication) | Nm-1206d1d32176e1d2013265921d2013265921d2d20100 | 2 | 700 |
| 2011 | 2011 | 2011 | date | 0 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 3 | Aged 35-49 | 3 | Age | 1000 | 3 | 20100 | Value | 344176 | A | Normal Value | FALSE | Free (free for publication) | Nm-1206d1d32176e1d2013265921d2013265921d3d20100 | 3 | 700 |
| 2011 | 2011 | 2011 | date | 0 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 4 | Aged 50-64 | 4 | Age | 1000 | 4 | 20100 | Value | 243426 | A | Normal Value | FALSE | Free (free for publication) | Nm-1206d1d32176e1d2013265921d2013265921d4d20100 | 4 | 700 |
| 2011 | 2011 | 2011 | date | 0 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 2013265921 | North East | E12000001 | regions | 480 | 0 | 5 | Aged 65-74 | 5 | Age | 1000 | 5 | 20100 | Value | 15701 | A | Normal Value | FALSE | Free (free for publication) | Nm-1206d1d32176e1d2013265921d2013265921d5d20100 | 5 | 700 |
This dataset is currently a flat 2D table despite containing multi-way tabulated counts: by region,origin,destination and age group. To wrangle this into counts by geographic unit to use with the polygon data, we apply the following transformation:
#names(t_wu02Ew )#Select only columns we need for now
ac <- c("USUAL_RESIDENCE_CODE","AGE_NAME","OBS_VALUE")
t_wu02Ew_ac <-t_wu02Ew[,ac]
reg_var <- xtabs(OBS_VALUE ~ USUAL_RESIDENCE_CODE + AGE_NAME, data=t_wu02Ew_ac)
reg_var <- as.data.frame.matrix(reg_var)
kable(reg_var)
| Aged 16-24 | Aged 16 and over | Aged 25-34 | Aged 35-49 | Aged 50-64 | Aged 65-74 | Aged 75+ | |
|---|---|---|---|---|---|---|---|
| E12000001 | 137047 | 974625 | 208221 | 358197 | 252002 | 16252 | 2906 |
| E12000002 | 386853 | 2705931 | 598669 | 986277 | 666548 | 57777 | 9807 |
| E12000003 | 294930 | 2029907 | 442170 | 739320 | 505157 | 41209 | 7121 |
| E12000004 | 247828 | 1777612 | 372762 | 656200 | 454732 | 39633 | 6457 |
| E12000005 | 290056 | 2106075 | 458558 | 771101 | 526138 | 51665 | 8557 |
| E12000006 | 315871 | 2299955 | 496908 | 834637 | 581497 | 60791 | 10251 |
| E12000007 | 374191 | 3197606 | 1056357 | 1101460 | 591100 | 60959 | 13539 |
| E12000008 | 461335 | 3391170 | 731448 | 1238930 | 852032 | 91961 | 15464 |
| E12000009 | 292428 | 2024395 | 417547 | 716798 | 531510 | 56730 | 9382 |
| W92000004 | 160618 | 1117784 | 238664 | 404170 | 284858 | 25071 | 4403 |
This data gives information on the working population by place of residence by age group. For simplicity, we will combine the seven age groups into two. Let’s assume person over 65 could be retired, whilst all other ages we can expect to working to generate two variables:
retired <- c("Aged 65-74", "Aged 75+" )
reg_var$w_Age <- rowSums(reg_var[,!names(reg_var) %in% retired])
reg_var$r_Age <- rowSums(reg_var[,retired])
Although you could join your tables on region names, geography codes where available will give you a much cleaner merge.
#attach the data to the dataframe component of the Spatial data
#Note: 0 means attach by row names
regions_json@data <- merge(regions_json@data, reg_var, by.x= "rgn15cd", by.y=0 )
tmap_mode("view")
## tmap mode set to interactive viewing
tm_shape(regions_json) + tm_polygons(col="r_Age")